Let be i.i.d. realizations from probability mass function (if discrete), or from density (if continuous), where is the random variable representing the parameter (or vector of parameters). We define the Maximum A Posteriori (MAP) estimator of to be the parameter which maximizes the posterior distribution of given the data.
(same as maximum likelihood, except instead of maximizing likelihood, it is maximizing likelihood multiplied by prior)
where loss function for i.i.d. , (compare maximum likelihood estimation)
The estimate of , , from the noisy observation , depending on the observed (noisy) value , is also denoted as . To obtain estimate , we will use the maximum a posteriori (MAP) estimator. The MAP estimator is based on the probability density function (pdf) of . Specifically, given an observed value , the MAP estimator asks what value of is most likely?That is, the MAP estimator looks for the value of w where the probability of is highest; it looks for the peak value. Therefore, the MAP estimator is defined as where ‘’ is the value of the argument where the function has its maximum. The pdf is the distribution of given a specific value , where,
(The MAP estimate is the point where the pdf of for some value of has its peak)
If
#incomplete
Soft thresholding softmax
References: